Closed Bug 1643689 Opened 5 years ago Closed 5 years ago

Enable manifest-scheduling on autoland

Categories

(Firefox Build System :: Task Configuration, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

Attachments

(9 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

Now that the initial implementation of 'manifest-scheduling' has landed, this bug will track turning it on for autoland.

Solving backfills will be the major blocker here, though we'll also need to ensure we don't regress Push Health in a major way.

To avoid regressions in sheriff's classifications quality, we should probably:

Summary: Enable the 'bugbug' manifest loader on autoland → Enable manifest-scheduling on autoland
Depends on: 1654591
Depends on: 1639873

Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.

Assignee: nobody → ahal
Status: NEW → ASSIGNED
Keywords: leave-open

We're planning to enable this tomorrow for a trial run to get a sense of:

A) Is everything working as it should (since this is hard to test on try).
B) How much of an impact does this have on sheriffing (and what we need to do to fix it).

We'll run the experiment until Thursday July 30th or it's obvious that it makes sheriffing too difficult (which may only take an hour or two until it gets to that point). If it turns out that sheriffs have no complaints and everything goes smoothly, there's a small chance that we'll leave it enabled past the end date.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9be5f086895c [taskgraph] enable manifest-scheduling on autoland, r=marco

Backed out changeset 9be5f086895c (bug 1643689) for busting gecko decision task and causig bug 1655807

Backout link: https://hg.mozilla.org/integration/autoland/rev/153accc0eb12651fa1b2d19ec1dc89c6cc6477d3

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=311287166&repo=autoland

...
[task 2020-07-28T16:24:16.329Z] Generating tasks for release-update-verify-next firefox-next-win32
[task 2020-07-28T16:24:16.329Z] Generated 0 tasks for kind release-update-verify-next
[task 2020-07-28T16:24:16.369Z] Generating full task graph
[task 2020-07-28T16:24:16.448Z] Full task graph contains 24419 tasks and 105201 dependencies
[task 2020-07-28T16:24:21.768Z] PERFHERDER_DATA: {"suites": [{"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 20.07702398099991, "name": "bugbug_push_schedules_time"}, {"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 2, "name": "bugbug_push_schedules_retries"}], "framework": {"name": "build_metrics"}}
[task 2020-07-28T16:24:21.768Z] Traceback (most recent call last):
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/mach_commands.py", line 205, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z]     return taskgraph.decision.taskgraph_decision(options)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 251, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z]     full_task_json = tgg.full_task_graph.to_json()
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 163, in full_task_graph
[task 2020-07-28T16:24:21.768Z]     return self._run_until('full_task_graph')
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 374, in _run_until
[task 2020-07-28T16:24:21.768Z]     k, v = next(self._run)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 304, in _run
[task 2020-07-28T16:24:21.768Z]     yield verifications('full_task_graph', full_task_graph, graph_config, parameters)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 58, in __call__
[task 2020-07-28T16:24:21.768Z]     parameters=parameters,
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 364, in verify_test_packaging
[task 2020-07-28T16:24:21.768Z]     raise Exception("\n".join(exceptions))
[task 2020-07-28T16:24:21.768Z] Exception: Build job build-linux64-tsan/opt has no tests, but specifies MOZ_AUTOMATION_PACKAGE_TESTS=1 in the environment. Unset MOZ_AUTOMATION_PACKAGE_TESTS in the task definition to fix.
[taskcluster 2020-07-28 16:24:23.303Z] === Task Finished ===
[taskcluster 2020-07-28 16:24:44.491Z] Unsuccessful task run with exit code: 1 completed in 198.821 seconds
Flags: needinfo?(ahal)

We decided to backout the trial. The issue happened because the algorithm decided no tests needed to run against that build and it tripped this check here:
https://searchfox.org/mozilla-central/rev/d9f92154813fbd4a528453c33886dc3a74f27abb/taskcluster/taskgraph/util/verify.py#358

I think we may need to disable this check if manifest-scheduling mode is enabled.

Flags: needinfo?(ahal)
Depends on: 1655978
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9e7c9323a832 [taskgraph] enable manifest-scheduling on autoland, r=marco

disable 1st round of manifest scheduling

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0bd8e8a498b1 disable 1st round of manifest scheduling. r=aryx

The dict needs to be passed to the last two substrategies, not just the last
one.

Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.

Depends on D90159

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10110918b6c0 [taskgraph] Fix error in 'split_bugbug_args', r=marco https://hg.mozilla.org/integration/autoland/rev/0b196026ed59 [taskgraph] enable manifest-scheduling on autoland, r=marco
Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/081af369ed79 Backed out changeset 0b196026ed59 for causing issues with manifest scheduling.

This was causing |mach try auto| to stop selecting manifests.

Regressions: 1665585
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f07222b728fa [taskgraph] Fix error in 'split_bugbug_args', r=marco
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bb594cb9abe3 [taskgraph] Fix taskgraph tests broken by f07222b728fa,

When enabling manifest scheduling, several interdependencies between tests were
revealed resulting in too many new intermittents. Make sure we disable
manifest-scheduling there for now.

Depends on D91588

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6c2a31b47d0b [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r=jmaher https://hg.mozilla.org/integration/autoland/rev/50195a6883bf [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r=jmaher https://hg.mozilla.org/integration/autoland/rev/2912d91dd291 [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r=jmaher

This is very bizarre, I couldn't reproduce on try and I can't reproduce locally. Even when on the exact same base revision and using parameters.yml from autoland...

I also tried running it with an earlier Python version in case that was the issue, but still no luck.

Flags: needinfo?(ahal)

facepalm

It's because I had already fixed the issue locally, but I guess never ended up submitting the changes to phabricator.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/906c9cf29da7 [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r=jmaher https://hg.mozilla.org/integration/autoland/rev/1b0858fe5cf2 [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r=jmaher https://hg.mozilla.org/integration/autoland/rev/0cceb980f44e [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r=jmaher
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/23bb4efd11b9 [taskgraph] enable manifest-scheduling on autoland, r=marco

I believe we are all done here. Regressions / follow-up work is all tracked in other bugs.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: